Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: pegz Date: Nov 7, 2012 2:42pm
Forum: texts Subject: Omni Magazine - any proof reading?

I was really looking forward to reading these again, but the random selection I've downloaded so far have such a huge number of appalling typo's as to make them virtually unreadable.
Did anyone do ANY proofreading?

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 8, 2012 1:25am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

So, a bit pointless generating all those other formats then if it's just going to produce different flavours of rubbish. Even if PDF files were a viable option for modern recreational reading, most of these are so badly done that they are almost equally unreadable. I expected better of the archive

Reply to this post
Reply [edit]

Poster: readerofbooks Date: Nov 8, 2012 4:52am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Can you share a link to the issue you're talking about?

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 8, 2012 4:40pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Well, start with http://archive.org/details/omni-best-of-1
- page one, and work through from there!
Quote -
"Cover pointing by Pierre Lacombe

FOL.NDI few by Iscac Asir-ic-.: i i,s"al;cn by H R, Giger
IAT TELLS THE TIME fit
■y HclanFI ison. lustro~-cn by 'vlaf. -(lorwein
ci - : cl! Lee-;, fex" ba Kara een vein
Cera. Ls-'OicnoyEveyri byb'
■ by Scot Morns
siralionoy ivelya oyer
■okoloy tsx' by F C. Duront III
NO FUTURE IN IT fiction by Joe Haldeirar I . aire ;.- by C ofrti ed -e -weir,
GALATEA GALANTE fiction c-v ■■- '''ea Bey ei ; lustration by H. R. Giger

ALIEN LANDSCAPES pictorial by Les Edwards, John Harris, Terry Ookes, and Tony Roberts

KINS VAN fict on by Ben Rove, i ustronor by John Schoenherr

SPACE CITIES pictorial by Harry Harrison

HALFJACKficorioy ~'cy? ■'.<? -:-■■■.■ ■ .:.-'-■ v -:.■■- bv Ivtchsl Henricot

SANDKINGS fiction by George R. R Martin llustrafol by Ernst Fuchs

PLANET STORV pictor c ey - rr 3'ir^s a:-'d -crry Harrison

ARTHUR C.C : .ARk'E!,-i:ef,igvv a-d i ...s-ofc', ::v VcIcoItsS. Kirk



Copyriari jM9.'5. '97y. !930 by Or.n. P..Ld--:lioni- -te-e.:!cnal -id All nglrs 'ess
aiaies o : ArTon,.;. Ni ■:!.-:■■ c""""b so;* tray ::,; epr-duosd n; -'onsTii-tec n any torn- o-
'■lecfifl-ica i--:udinc bhr-.iDr.c-pvi-g (eco'OV'g. or any irfcir'iai'on a--d 'errievslsyste-r
the D_bi st'is 1 I- ■ o'" i O.T'i,- t jfjai rie (Eon Gi.ooio-ne. e:: "::' i:-uo:is-(V. ;r;i design "i'g.~;

pub.is-er) 309 II- id Avenue N. n o. ■ ! . ■ ::..'. ■ n: li- 1 ■ ■. r . a, i ■ .■!!■ ,

Canada. Library of CcngisMx oa-:;iog 73-92003 Frs; sd Iky Omni is 3 regist"

I always thought Omni was published in English!

Reply to this post
Reply [edit]

Poster: Parsnip Date: Nov 9, 2012 2:31am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Simple OCR - no proof-reading. That is all.

The PDFs and Read Online versions are legible and pleasant to read apart from one or two places, as far as I can see, especially if you view as one page and zoom in. Bear in mind that this was a magazine printed apparently on shiny paper. I think this would make scanning, and particularly OCR, very difficult.

I suppose if you are trying to read on a tiny screen, it would be impossible, but lots of things are impossible with tiny screens.

Do you, incidentally, have any idea how long it takes to correct OCR'd text? I have done just one book for PG, and it took an age. There would be a tiny fraction of the material available on the Internet Archive if the text of every item were proof-read. I would rather have the amazing quantity and read online or in PDF in (mostly) excellent quality, than not have it at all.

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 9, 2012 2:51am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Fair enough, if you think distributing gibberish is the aim of the archive, then you are entitled to your opinion, I've never considered 'quantity over quality' to be a particularly good argument myself. My point is that if blurry PDF files are the best available, don't get us all excited but pretending to produce readable versions in other formats. Maybe you still think that sitting bolt upright in front of a big screen is a good reading experience, but the rest of the world has moved on from there.
Sadly, because it's now been 'covered' by the archive, the other archiving services like Gutenberg (who do produce excellent readable files in all formats)probably won't bother doing them properly, and they will be as good as lost forever.
Now, excuse me, I have to arrange to get my computer and 30" screen installed on the bus so I can read my magazine on the way in to town.......

Reply to this post
Reply [edit]

Poster: stbalbach Date: Nov 9, 2012 9:38am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Project Gutenberg mines most of its texts from Internet Archive. They find a PDF and manually transcribe it to plain text through the Distributed Proofreaders project (pgdp.net). It's long hard work by teams of volunteers. Gutenberg is an end-product created from the raw material on Internet Archive (and elsewhere).

To keep in perspective, with thousands of volunteers Gutenberg has only proofread about 25,000 texts in 12 years or so. In about the same time, Internet Archive has scanned 3 million or so books. As you can see, the real work is in the proofreading, so really *any* text that has been proofread is sort of a minor miracle. It's just a tremendous amount of effort to proofread. I sort of chuckle when people think everything should be proofread, as if that was a minor thing, it will probably take 100 years or more, it's the work of generations.

This post was modified by stbalbach on 2012-11-09 17:38:55

Reply to this post
Reply [edit]

Poster: Parsnip Date: Nov 9, 2012 9:03am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

May I suggest that you transcribe them for Project Gutenberg? They heartily welcome new contributions. You seem still not to realise that most of this work is done by volunteers.

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 9, 2012 12:05pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

I do realise that the proofreading is voluntary, because I am a volunteer myself :~)
I'm simply stating that all the derivative versions are, in this case, a totally pointless waste of server space, as they are unreadable.
Obviously quantity is more important than quality here. A shame really, as it's lowered my respect for the archive considerably

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Nov 9, 2012 1:24pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

our apologies.

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 11, 2012 3:33am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Firstly, thanks for the apology, I'm sorry if I went a bit overboard, it was probably mainly due to embarrassment! Like many others, I've been spreading the word about Omni being available again on several forums, but, alas, without trying to read one first. Maybe I should do some proof reading too :~) I guess my love of I.A. and Omni got the better of me.
Secondly, when I say 'proof reading', I don't expect perfection. I just would have thought that someone might have glanced at the first page of the first issue to be converted, (the one I pasted above), and thought "Hang on, something not quite right here....." before ploughing on through the rest.

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 11, 2012 3:58am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

...anyway, if the proof reading was perfect, I'd miss out on glorious lines such as 'Life is adrift in a sea of Radox'!
I know from the context it should be 'radiation', but so much more relaxing to think of it as drifting in pine scented bath salts.......

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 11, 2012 6:09am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

The OCR software could delete the obvious gibberish in the text file. It is not difficult for the software to identify it -- just notice that there are a very few dictionary words amongst all the characters on the page. This will take care of the cases where there are whole files with no useful word! (e.g. with Sanskrit texts.)

But it is computationally expensive! What is not even worthy of notice for one file -- like deleting obvious gibberish -- becomes something to consider when you have hundreds or thousands of files, even if the algorithm is straightforward. In the IA case, with all sorts of “derivations” of texts and audio and video of probably thousands of files simultaneously (all of which are computationally intensive) the developers are probably not free to do all that they may wish to!

But if the engineers have not considered this, someone should draw their attention to this! Note that the unedited text file is not a “faithful representation” which should not be touched; the PDF file, or more exactly, the tiff images, are the “faithful representation”.

This post was modified by aibek on 2012-11-11 14:09:41

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Nov 11, 2012 8:55am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

we'rewell aware that, to be generous, OCR is less than perfect. an your correct that at scale we would not be able to manually review or correct the abbyy OCR. at this point the main suggestions are to crowdsource (which we do not have the manpower to manage) or have interested folks upload corrected files to new items and use good metadata so that the corrected file items appear high in search results.

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 8, 2012 6:00am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Pegz probably means this:
http://archive.org/details/omni-magazine

The OCR is very useful in most cases. For almost all the books I have come across, the result is excellent. I have read a couple of 1000 pages book using the OCR copy of IA.

I must add that I checked a random Omni Magazine page:
http://archive.org/stream/omni-magazine-1986-10/OMNI_1986_10_djvu.txt

It is quite readable. From this I conclude that the corresponding PDF file for this is fine too.

This post was modified by aibek on 2012-11-08 14:00:34

Reply to this post
Reply [edit]

Poster: pegz Date: Nov 11, 2012 6:50am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

That's the edition that includes the chapter heading 'ARTIFICIAL IfUTELLIGERJCE'
:-D

Reply to this post
Reply [edit]

Poster: Parsnip Date: Nov 7, 2012 3:31pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Simple answer is no. Try the 'Read Online' or PDF options, which are actual scans.

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 7, 2012 5:25pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

The text files are generated via OCR of the uploaded PDF files.

Reply to this post
Reply [edit]

Poster: mattwj2005 Date: Nov 10, 2012 11:04pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Archive.org is a great source for all types of media. Public domain works of text are one of them.

Wikisource is another project that does proofreading. If you want come on over to Wikisource and help out.

That is my two cents.

Thanks,

Matt

Reply to this post
Reply [edit]

Poster: stbalbach Date: Nov 11, 2012 1:55am
Forum: texts Subject: Re: Omni Magazine - any proof reading?

Wikisource has become overly complex and specialized. It's technically impressive what was accomplished, just not going to attract many editors due to the drudge work required. It sort of feels like the project was strangled in order to achieve perfection. I liked the old days of free form html and more relaxed about project scope.

(Wikiesource is still a great place and do recommend anyone to use it as a place to do proofreading of texts)

This post was modified by stbalbach on 2012-11-11 09:55:16

Reply to this post
Reply [edit]

Poster: garthus1 Date: Nov 14, 2012 2:50pm
Forum: texts Subject: Re: Omni Magazine - any proof reading?

To all,

Do not forget distributed proof readers, they do the proofing for project Gutenberg.

I do not know what the problem is with those scans; in time when OCR gets better, they can be re-derived and new text files generated. I would welcome anything that anyone puts up and if that is not good enough others will eventually come along and submit better scans. All of my scans are high quality but that may be more a function of having good equipment and better software. Since we do not know the situation of those others who submitted the scans in question, criticism is unwarranted. Perhaps they were doing the best that they could and in that case what we have is far better than nothing. I am with the group which thinks that getting as much up as possible as fast as possible is the best approach. Once the materials are in the Public Domain it will be much more difficult for our government hacks to change copyright rules and remove Public Domain items. Generally, most Archive submissions are 'very' good considering that they come from volunteers almost exclusively. I do not really understand the criticism; I am thankful for anything that our volunteers upload … the forum is the place to discuss how to make things better. Most people will want their name to be associated with good work. I have already resubmitted items which I thought to be of inferior quality but of great importance to be in the Archive. That is the route which we should all be taking.

Gerry